Compressing Big Data: when the rate of convergence to the entropy matters

نویسنده

  • Filippo Mignosi
چکیده

In this talk we discuss of the rate of convergence to the entropy of dictionary based compressors. A faster rate of convergence to the theoretical compression limit should correspond to better compression in practice, but constants also matters. Therefore in the analysis of the rate of convergence one must also analyse the “transient” phase. Concerning dictionary based compressors, it is known that LZ78-alike compressors have a faster convergence than LZ77-alike compressors, when the texts to be compressed are generated by a memoryless source. In practice instead it seems that LZ77-alike performs better. This seems due to the e↵ect of a strategy of Optimal Parsing (that can be applied in both LZ77 and LZ78 cases) rather then to the fact that the texts are generated by a memoryless source. To our best knowledge there are no theoretical results concerning the rate of convergence to the entropy of both LZ77 and LZ78 case when it is used a strategy of Optimal Parsing. We discuss some experimental results on LZ78 that show that the rate of convergence to the entropy presents a kind of wave e↵ect that become bigger and bigger as the entropy of the memoryless source decrease. It can be a tsunami for a zero entropy source. Copyright c by the paper’s authors. Copying permitted only for private and academic purposes. In: Costas S. Iliopoulos, Alessio Langiu (eds.): Proceedings of the 2nd International Conference on Algorithms for Big Data, Palermo, Italy, 7-9 April 2014, published at http://ceur-ws.org/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

2 nd International Conference on Algorithms for Big Data

In this talk we discuss of the rate of convergence to the entropy of dictionary based compressors. A faster rate of convergence to the theoretical compression limit should correspond to better compression in practice, but constants also matters. Therefore in the analysis of the rate of convergence one must also analyse the “transient” phase. Concerning dictionary based compressors, it is known ...

متن کامل

Feature Selection in Structural Health Monitoring Big Data Using a Meta-Heuristic Optimization Algorithm

This paper focuses on the processing of structural health monitoring (SHM) big data. Extracted features of a  structure are reduced using an optimization algorithm to find a minimal subset of salient features by removing noisy, irrelevant and redundant data. The PSO-Harmony algorithm is introduced for feature selection to enhance the capability of the proposed method for processing the  measure...

متن کامل

Improved COA with Chaotic Initialization and Intelligent Migration for Data Clustering

A well-known clustering algorithm is K-means. This algorithm, besides advantages such as high speed and ease of employment, suffers from the problem of local optima. In order to overcome this problem, a lot of studies have been done in clustering. This paper presents a hybrid Extended Cuckoo Optimization Algorithm (ECOA) and K-means (K), which is called ECOA-K. The COA algorithm has advantages ...

متن کامل

Thermodynamic analysis of a magnetohydrodyamic oldroyd 8-constant fluid in a vertical channel with heat source and slippage

Thermodynamic analysis of a steady state flow and heat transfer of an Oldroyd 8-constant fluid with effect of heat source, velocity slip and buoyancy force under tranverse a magnetic field is is carried out in this paper. The model for momentum and energy balance is tackled numerically using Method of Weighted Residual (MWR). Partition method is used to minimize the associated residuals. The re...

متن کامل

Taylor Expansion for the Entropy Rate of Hidden Markov Chains

We study the entropy rate of a hidden Markov process, defined by observing the output of a symmetric channel whose input is a first order Markov process. Although this definition is very simple, obtaining the exact amount of entropy rate in calculation is an open problem. We introduce some probability matrices based on Markov chain's and channel's parameters. Then, we try to obtain an estimate ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014